perm filename 106A26[1,RWF] blob sn#728172 filedate 1983-09-27 generic text, type T, neo UTF8
More Dynamic Programming.

I play a simple dice game with an opponent: he rolls two dice, then I roll
two dice, and we alternate until one of us has a total of 25 or more.  He
offers to bet three to one that he will win.  I should accept if my winning
chance is more than 0.25, and decline if less.
(This problem is a simplification of problems arising in backgammon endgames).

I can get a tolerable approximation of my chances by playing the game by
myself a few hundred times, but if the actual chance is within a few
percent of 0.25, I would need to play perhaps a thousand times to avoid a
significant chance of making the wrong decision.  I can program a computer
to make the same experiment, but it is easier, faster, and more reliable to
have it calculate the exact probabilities by another application of the
dynamic programming method.  Let f(A,B) be the chance for the first player
to win when he needs a total of A to win, and the second player needs a
total of B to win.  If A is zero or negative, f(A,B) = 1; if B is zero or
negative, f(A,B) = 0.  Otherwise, the first player rolls two dice, getting
numbers i and j between 1 and 6, and the second player now gets a turn, so
the second player's chance is f(B,A-i-j).  The two player's chances total
1.00, so the first player's chance after rolling the dice is 1-f(B,A-i-j).
The first player's chance before rolling the dice is found by averaging
over all the possible dice rolls; it is 

	1 - 1/36 (sigma) [1<=i<=6, 1<=j<=6] f(B, A-i-j)
			↑

I can calculate f(A,B) for each value of A and B up to 25, and save each in
a table as I go, for use in calculating f for larger values of A and B.  I
must be sure, though, that when I calculate f(A,B) I have already
calculated f(B,A-2) through f(B, A-12);  this can be calculated if I do the
calculation in the order

f(1,1)
f(2,1), f(1,2), f(2,2)
f(3,1), f(3,2), f(1,3), f(2,3), f(3,3)
f(4,1), f(4,2), f(4,3), f(1,4), f(2,4), f(3,4), f(4,4)
etc.

(Another satisfactory order would be by increasing sums of A and B)

The algorithm can be summarized this way:

	Initialize;

	FOR LARGER:= 1 TO 2.5 DO
		BEGIN
		FOR K:=1 TO LARGER-1 DO
			GETF(LARGER, K);
		FOR K:=1 TO LARGER DO
			GETF(K, LARGER)
		END;

	WRITE F[25,25]

where the procedure GETF(A,B) has this body:

	BEGIN
	SUM:=0,
	FOR I:=1 TO 6 DO
		FOR J:=1 TO 6 DO
			SUM:=SUM + F[B,A-I-J];
	F[A,B]:=1.00 -SUM/36.0
	END

Exercise:

Which values of the array F need to be initialized?  What  should the
subscript bounds on F be?

The theme of this example is that the easiest way to compute a function for
one assignment of argument values may be to compute a complete table of the
function.